Let's apply PCA to analyze voting patterns



In [1]:

    
import matplotlib
matplotlib.use('nbagg')
%matplotlib inline



In [2]:

    
import graphlab as gl

Load reviews data using SFrame's powerful unstructured data handling capabilities



In [3]:

    
reviews = gl.SFrame.read_csv('../data/yelp/yelp_training_set_review.json', header=False)
reviews = reviews.unpack('X1','')
reviews = reviews.unpack('votes', '')
reviews['total_votes'] = reviews['funny'] + reviews['cool'] + reviews['useful']
reviews









    



[INFO] This commercial license of GraphLab Create is assigned to engr@dato.com.

[INFO] Start server at: ipc:///tmp/graphlab_server-28311 - Server binary: /Users/alicez/.graphlab/anaconda/lib/python2.7/site-packages/graphlab/unity_server - Server log: /tmp/graphlab_server_1443318612.log
[INFO] GraphLab Server Version: 1.6.1






    



PROGRESS: Finished parsing file /Users/alicez/Documents/training/Strata NYC 2015/data/yelp/yelp_training_set_review.json
PROGRESS: Parsing completed. Parsed 100 lines in 0.898333 secs.
------------------------------------------------------
Inferred types from first line of file as 
column_type_hints=[dict]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------
PROGRESS: Read 55824 lines. Lines per second: 32774.1
PROGRESS: Finished parsing file /Users/alicez/Documents/training/Strata NYC 2015/data/yelp/yelp_training_set_review.json
PROGRESS: Parsing completed. Parsed 229907 lines in 4.57022 secs.






    Out[3]:





    
        business_id
        date
        review_id
        stars
        text
        type
    
    
        9yKzy9PApeiPPOUJEtnvkg
        2011-01-26
        fWKvX83p0-ka4JS3dc6E5A
        5
        My wife took me here on
my birthday for break ...
        review
    
    
        ZRJwVLyzEJq1VAihDhYiow
        2011-07-27
        IjZ33sJrzXqU-0X6U8NwyA
        5
        I have no idea why some
people give bad reviews ...
        review
    
    
        6oRAC4uyJCsJl1X0WZpVSA
        2012-06-14
        IESLBzqUCLdSzSqm0eCSxQ
        4
        love the gyro plate. Rice
is so good and I also ...
        review
    
    
        _1QQZuf4zZOyFCvXc0o6Vg
        2010-05-27
        G-WvGaISbqqaMHlNnByodA
        5
        Rosie, Dakota, and I LOVE
Chaparral Dog Park!!! ...
        review
    
    
        6ozycU1RpktNG2-1BroVtw
        2012-01-05
        1uJFq2r5QfJG_6ExMRCaGw
        5
        General Manager Scott
Petello is a good egg!!! ...
        review
    
    
        -yxfBYGB6SEqszmxJxd97A
        2007-12-13
        m2CKSsepBCoRYWxiRUsxAg
        4
        Quiessence is, simply
put, beautiful.  Full ...
        review
    
    
        zp713qNhx8d9KCJJnrw1xA
        2010-02-12
        riFQ3vxNpP4rWLk_CSri2A
        5
        Drop what you're doing
and drive here. After I ...
        review
    
    
        hW0Ne_HTHEAgGF1rAdmR-g
        2012-07-12
        JL7GXJ9u4YMx7Rzs05NfiQ
        4
        Luckily, I didn't have to
travel far to make my ...
        review
    
    
        wNUea3IXZWD63bbOQaOH-g
        2012-08-17
        XtnfnYmnJYi71yIuGsXIUA
        4
        Definitely come for Happy
hour! Prices are amaz ...
        review
    
    
        nMHhuYan8e3cONo3PornJA
        2010-08-11
        jJAIXA46pU1swYyRCdfXtQ
        5
        Nobuo shows his unique
talents with everything ...
        review
    


    
        user_id
        cool
        funny
        useful
        total_votes
    
    
        rLtl8ZkDX5vH5nAx9C3q5Q
        2
        0
        5
        7
    
    
        0a2KyEL0d3Yb1V6aivbIuQ
        0
        0
        0
        0
    
    
        0hT2KtfLiobPvh6cDC8JQg
        0
        0
        1
        1
    
    
        uZetl9T0NcROGOyFfughhg
        1
        0
        2
        3
    
    
        vYmM4KTsC8ZfQBg-j5MWkw
        0
        0
        0
        0
    
    
        sqYN3lNgvPbPCTRsMFu27g
        4
        1
        3
        8
    
    
        wFweIWhv2fREZV_dYkz_1g
        7
        4
        7
        18
    
    
        1ieuYcKS7zeAv_U15AB13A
        0
        0
        1
        1
    
    
        Vh_DlizgGhSqQh4qfZ2h6A
        0
        0
        0
        0
    
    
        sUNkXg8-KFtCMQDV6zRzQg
        0
        0
        1
        1
    

[229907 rows x 11 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

PCA



In [4]:

    
from sklearn.decomposition import PCA
import pandas as pd

Convert to Pandas Dataframe for PCA



In [5]:

    
data = reviews[['funny','cool','useful']].to_dataframe()

Run PCA



In [6]:

    
pca = PCA(n_components=3)
pca.fit(data)









    Out[6]:





PCA(copy=True, n_components=3, whiten=False)

Matplotlib Incantations



In [7]:

    
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt

def plot_figs(ax, data, pca):
    ax.set_xlabel('funny')
    ax.set_ylabel('cool')
    ax.set_zlabel('useful')
    a = data.funny - data.funny.mean()
    b = data.cool - data.cool.mean()
    c = data.useful - data.useful.mean()
    ax.scatter(a[::20], b[::20], c[::20], marker='+', alpha=1)

    pca_score = pca.explained_variance_ratio_
    V = pca.components_

    x_pca_axis, y_pca_axis, z_pca_axis = 2 * V.T * pca_score / pca_score.min()
    ax.plot(xs=(0, x_pca_axis[0]), ys=(0, y_pca_axis[0]), zs=(0, z_pca_axis[0]), color='r', linewidth=4)
    ax.plot(xs=(0, x_pca_axis[1]), ys=(0, y_pca_axis[1]), zs=(0, z_pca_axis[1]), color='g', linewidth=4)
    ax.plot(xs=(0, x_pca_axis[2]), ys=(0, y_pca_axis[2]), zs=(0, z_pca_axis[2]), color='y', linewidth=4)
    return None

Make pretty PCA Pictures



In [8]:

    
elev = 30
azim = 20
fig = plt.figure(1, figsize=(16, 12))
ax = fig.add_subplot(111, projection='3d')
ax.view_init(elev, azim)
plot_figs(ax, data, pca)

But Really, the objective is interpretation



In [9]:

    
pca = PCA(4)
pca.fit(reviews[['funny','cool','useful','stars']].to_dataframe())
print(pca.explained_variance_ratio_) 
pca.components_









    



[ 0.77406014  0.11124671  0.08121511  0.03347804]






    Out[9]:





array([[ 0.50067068,  0.5692005 ,  0.65214322, -0.00699229],
       [ 0.19674739, -0.15560552, -0.02560983, -0.96768875],
       [-0.81047211,  0.08415487,  0.54670603, -0.19278336],
       [-0.23184971,  0.80294189, -0.52456256, -0.16237045]])

Let's try to interpret this.

1st Component

The primary component says that most of variation in the data (77%), is explained reviews which are funny, cool, useful, but has nothing to do with stars. In other words, funny, cool, useful are correlated, but the amount of those votes does not tell you anything about stars.

2nd Component

The second component explains 11% of the variation of the data and it says that "funny reviews" are anti-correlated with "stars" and somewhat anti-correlated with "cool". There's a nearly negligible negative correlation with "useful."

business_id	date	review_id	stars	text	type
9yKzy9PApeiPPOUJEtnvkg	2011-01-26	fWKvX83p0-ka4JS3dc6E5A	5	My wife took me here on my birthday for break ...	review
ZRJwVLyzEJq1VAihDhYiow	2011-07-27	IjZ33sJrzXqU-0X6U8NwyA	5	I have no idea why some people give bad reviews ...	review
6oRAC4uyJCsJl1X0WZpVSA	2012-06-14	IESLBzqUCLdSzSqm0eCSxQ	4	love the gyro plate. Rice is so good and I also ...	review
_1QQZuf4zZOyFCvXc0o6Vg	2010-05-27	G-WvGaISbqqaMHlNnByodA	5	Rosie, Dakota, and I LOVE Chaparral Dog Park!!! ...	review
6ozycU1RpktNG2-1BroVtw	2012-01-05	1uJFq2r5QfJG_6ExMRCaGw	5	General Manager Scott Petello is a good egg!!! ...	review
-yxfBYGB6SEqszmxJxd97A	2007-12-13	m2CKSsepBCoRYWxiRUsxAg	4	Quiessence is, simply put, beautiful. Full ...	review
zp713qNhx8d9KCJJnrw1xA	2010-02-12	riFQ3vxNpP4rWLk_CSri2A	5	Drop what you're doing and drive here. After I ...	review
hW0Ne_HTHEAgGF1rAdmR-g	2012-07-12	JL7GXJ9u4YMx7Rzs05NfiQ	4	Luckily, I didn't have to travel far to make my ...	review
wNUea3IXZWD63bbOQaOH-g	2012-08-17	XtnfnYmnJYi71yIuGsXIUA	4	Definitely come for Happy hour! Prices are amaz ...	review
nMHhuYan8e3cONo3PornJA	2010-08-11	jJAIXA46pU1swYyRCdfXtQ	5	Nobuo shows his unique talents with everything ...	review

user_id	cool	funny	useful	total_votes
rLtl8ZkDX5vH5nAx9C3q5Q	2	0	5	7
0a2KyEL0d3Yb1V6aivbIuQ	0	0	0	0
0hT2KtfLiobPvh6cDC8JQg	0	0	1	1
uZetl9T0NcROGOyFfughhg	1	0	2	3
vYmM4KTsC8ZfQBg-j5MWkw	0	0	0	0
sqYN3lNgvPbPCTRsMFu27g	4	1	3	8
wFweIWhv2fREZV_dYkz_1g	7	4	7	18
1ieuYcKS7zeAv_U15AB13A	0	0	1	1
Vh_DlizgGhSqQh4qfZ2h6A	0	0	0	0
sUNkXg8-KFtCMQDV6zRzQg	0	0	1	1